Pairwise, Magnitude, or Stars: What's the Best Way for Crowds to Rate?

نویسندگان

  • Alessandro Checco
  • Gianluca Demartini
چکیده

We compare three popular techniques of rating content: the ubiquitous five star rating, the less used pairwise comparison, and the recently introduced (in crowdsourcing) magnitude estimation approach. Each system has specific advantages and disadvantages, in terms of required user effort, achievable user preference prediction accuracy and number of ratings required. We design an experiment where the three techniques are compared in an unbiased way. We collected 39’000 ratings on a popular crowdsourcing platform, allowing us to release a dataset that will be useful for many related studies on user rating techniques. Introduction Users rating content on the Web is a key activity for a variety of applications: from recommender systems to information retrieval system evaluation. The most common way for users to rate content is star rating. In 2006, Netflix released a dataset containing 100 million movie ratings using the star system, offering a $1M prize to improve their recommender system (Bennett and Lanning 2007). Alternatives to the star rating approach exist. For example, magnitude estimation, originally developed for psychophysical measurement (Stevens 1966), has been recently proposed for crowdsourced ratings collection applied to information retrieval evaluation (Turpin et al. 2015). With this method users are allowed to use any numerical value to rate content so that they are always free to put an higher/lower score as compared to the content they have seen so far. Pairwise comparison has a long history, but it has the problem of requiring a high number of comparisons to achieve good user preference prediction accuracy (Wauthier, Jordan, and Jojic 2013). Experimental Setup The experiment design we use to compare these three rating approaches is structured as follows. We selected 10 most popular images of paintings, obtained from artcyclopedia. com top 10 poster sales. We then asked crowd workers to rate them using three different methods: Magnitude Using any positive integer (zero excluded). Copyright c © 2016, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. Figure 1: Graphical interface to let the worker express their preference on the ranking induced by their own ratings. Stars Choosing between 1 to 5 stars. Pairwise Pairwise comparisons between two images, with no ties allowed (note that this requires 45 comparisons for our 10 images!). We ask each worker to rate the images using all 3 rating systems. Since the order with which we ask to use a different rating system affects the outcome, we run 6 different experiments (one for each combination of these three types) with 100 participants in each of them. We can thus analyse the bias given by the rating system order, and the results without order bias by using the aggregated data. We obtain a dataset with a total of 39’000 ratings (45+10+10)1. At the end of the rating activity in the task we dynamically build the three painting rankings induced by the choices of the participant (pairwise ranking is obtained by a one point tournament), and ask them which of the three rankings better reflects their preference2 (an example screenshot is shown The dataset is available for download at https://github.com/ AlessandroChecco/PairwiseMagnitudeStars The ranking comparison is blind: There is no indication on ar X iv :1 60 9. 00 68 3v 1 [ cs .I R ] 2 S ep 2 01 6 Mean Median Preferences Magnitude 3.74 4.0 98 Stars 3.89 4.0 107 Pairwise 4.30 5.0 243 Table 1: Mean and median rating (out of 5) of the ranking induced by the three techniques. We also report the number of times a method was preferred by workers over the others (excluding ties). in Figure 1). This will reveal which kind of rating system is preferable. We also collect the time spent in each rating activity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Fast Algorithm for Covering Rectangular Orthogonal Polygons with a Minimum Number of r-Stars

Introduction This paper presents an algorithm for covering orthogonal polygons with minimal number of guards. This idea examines the minimum number of guards for orthogonal simple polygons (without holes) for all scenarios and can also find a rectangular area for each guards. We consider the problem of covering orthogonal polygons with a minimum number of r-stars. In each orthogonal polygon P,...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

Wised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge

The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...

متن کامل

Pulsating red giant and supergiant stars in the Local Group dwarf galaxy Andromeda I

We have conducted an optical long-term monitoring survey of the majority of dwarf galaxies in the Local Group, with the Isaac Newton Telescope (INT), to identify the long period variable (LPV) stars. LPV stars vary on timescales of months to years, and reach the largest amplitudes of their brightness variations at optical wavelengths, due to the changing temperature. They trace stellar populati...

متن کامل

The Effect of Transitive Closure on the Calibration of Logistic Regression for Entity Resolution

This paper describes a series of experiments in using logistic regression machine learning as a method for entity resolution. From these experiments the authors concluded that when a supervised ML algorithm is trained to classify a pair of entity references as linked or not linked pair, the evaluation of the model’s performance should take into account the transitive closure of its pairwise lin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1609.00683  شماره 

صفحات  -

تاریخ انتشار 2016